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Abstract. We consider infinite-state turn-based stochastic games of two play- 
ers, □ and O, who aim at maximizing and minimizing the expected total reward 
accumulated along a run, respectively. Since the total accumulated reward is un- 
bounded, the determinacy of such games cannot be deduced directly from Mar- 
tin's determinacy result for Blackwell games. Nevertheless, we show that these 
games are determined both for unrestricted (i.e., history-dependent and random- 
ized) strategies and deterministic strategies, and the equilibrium value is the same. 
Further, we show that these games are generally not determined for memoryless 
strategies. Then, we consider a subclass of -finitely '-branching games and show 
that they are determined for all of the considered strategy types, where the equi- 
librium value is always the same. We also examine the existence and type of 
(E-)optimal strategies for both players. 



1 Introduction 

Turn-based stochastic games of two players are a standard model of discrete systems 
that exhibit both non-deterministic and randomized choice. One player (called □ or 
Max in this paper) corresponds to the controller who wishes to achieve/maximize some 
desirable property of the system, and the other player (called O or Min) models the 
environment which aims at spoiling the property. Randomized choice is used to model 
events such as system failures, bit-flips, or coin-tossing in randomized algorithms. 

Technically, a turn-based stochastic game (SG) is defined as a directed graph where 
every vertex is either stochastic or belongs to one of the two players. Further, there is a 
fixed probability distribution over the outgoing transitions of every stochastic vertex. A 
play of the game is initiated by putting a token on some vertex. Then, the token is moved 
from vertex to vertex by the players or randomly. A strategy specifies how a player 
should play. In general, a strategy may depend on the sequence of vertices visited so 
far (we say that the strategy is history-dependent (H)), and it may specify a probability 
distribution over the outgoing transitions of the currently visited vertex rather than a 
single outgoing transtion (we say that the strategy is randomized (R)). Strategies that 
do not depend on the history of a play are called memoryless (M), and strategies that 
do not randomize (i.e., select a single outgoing transition) are called determinisctic (D). 
Thus, we obtain the MD, MR, HD, and HR strategy classes, where HR are unrestricted 
strategies and MD are the most restricted memoryless deterministic strategies. 
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A game objective is usually specified by a payoff function which assigns some real 
value to every run (infinite path) in the game graph. The aim of Player □ is to maximize 
the expected payoff, while Player O aims at minimizing it. It has been shown in [22 1 
that for bounded and Borel payoff functions, Martin's determinacy result for Blackwell 
games |23] implies that 

sup inf ^[Payoff] = inf sup [Payoff] (1) 

o-eHR n TreHRo 7reHR <reHR n 

where HR D and HR are the classes of HR strategies for Player □ and Player O, respec- 
tively. Hence, every vertex v has a HR value ValHR(v) specified by ([1). A HR strategy is 
optimal if it achieves the outcome ValHR(v) or better against every strategy of the other 
player. In general, optimal strategies are not guaranteed to exist, but ([TJ implies that 
both players have s-optimal HR strategies for every s > (see Section [2] for precise 
definitions). 

The determinacy results of [23 22] cannot be applied to unbounded payoff func- 
tions, i.e., these results do not imply that (Q]) holds if Payoff is unbounded, and they do 
not say anything about the existence of a value for restricted strategy classes such as MD 
or MR. In the context of performance analysis and controller synthesis, these questions 
rise naturally; in some cases, the players cannot randomize or remember the history of a 
play, and some of the studied payoff functions are not bounded. In this paper, we study 
these issues for the total accumulated reward payoff function and infinite-state games. 

The total accumulated reward payoff function, denoted by Acc, is defined as follows. 
Assume that every vertex v is assigned a fixed non-negative reward r(v). Then Acc 
assigns to every run the sum of rewards all vertices visited along the run. Obviously, 
Acc is unbounded in general, and may even take the oo value. A special case of total 
accumulated reward is termination time, where all vertices are assigned reward 1 , except 
for terminal vertices that are assigned reward (we also assume that the only outgoing 
transition of every terminal vertex t is a self-loop on t). Then, E^fAcc] corresponds to 
the expected termination time under the strategies cr, n. Another special (and perhaps 
simplest) case of total accumulated reward is reachability, where the target vertices 
are assigned reward 1 and the other vertices have zero reward (here we assume that 
every target vertex has a single outgoing transition to a special state s with zero reward, 
where s — > s is the only outgoing transition of s). Although the reachability payoff is 
bounded, some of our negative results about the total accumulated reward hold even for 
reachability (see below). 

The reason for considering infinite-state games is that many recent works study 
various algorithmic problems for games over classical automata-theoretic models, such 
as pushdown automata 1 15I16I17I14I9I8I . lossy channel systems 13I2L one-counter au- 
tomata 11715161 . or multicounter automata |18 1 111 012 111 2141 . which are finitely rep- 
resentable but the underlying game graph is infinite and sometimes even infinitely- 
branching (see, e.g., Ill 11101211 ). Since the properties of finite-state games do not carry 
over to infinite-state games in general (see, e.g., 112011 ). the above issues need to be re- 
visited and clarified explicitly, which is the main goal of this paper. 

Our contribution: We consider general infinite-state games, which may contain 
vertices with infinitely many outgoing transitions, and O-finitely-branching games, 
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where every vertex of V has finitely many outgoing transitions, with the total accu- 
mulated reward objective. For general games, we show the following: 

- Every vertex has both a HR and a HD value, and these values are equafl 

- There is a vertex v of a game G with reachability objective such that v has neither 
MD nor MR value. Further, the game G has only one vertex (belonging to Player O) 
with infinitely many outgoing transitions. 

It follows from previous works (see, e.g., M8I20ID that optimal strategies in general 
games may not exist, and even if they do exist, they may require infinite memory. Inter- 
estingly, we observe that an optimal strategy for Player □ (if it exists) may also require 
randomization in some cases. 

For ^/-finitely-branching games, we prove the following results: 

- Every vertex has a HR, HD, MR, and MD value, and all of these values are equal. 

- Player O has an optimal MD strategy in every vertex. 

It follows from the previous works that Player □ may not have an optimal strategy and 
even if he has one, it may require infinite memory. Let us note that in finite-state games, 
both players have optimal MD strategies (see, e.g., Ifl9l ). 

Our results are obtained by generalizing the arguments for reachability objectives 
presented in [8 1, but there are also some new observations based on original ideas and 
new counterexamples. In particular, this applies to the existence of a HD value and the 
non-existence of MD and MR values in general games. 

2 Preliminaries 

In this paper, the sets of all positive integers, non-negative integers, rational numbers, 
real numbers, and non-negative real numbers are denoted by N, No, Q, R, and R-°, 
respectively. We also use R^, to denote the set R-° U {oo}, where oo is treated according 
to the standard conventions. For all c e R^° and s e [0, oo), we define the lower and 
upper e-approximation of c, denoted by c Q e and c ffi s, respectively, as follows: 

c® s - c + s for all c e Rjjf and s e [0, oo), 
c e e = c - s for all c e R-° and s e [0, oo), 

oo Qs - l/s for all e € (0, oo), 

oo q = oo . 

Given a set V, the elements of (Rjj,°) y are written as vectors x,y, . . ., where x v denotes 
the v-component of x for every v eV. The standard component- wise ordering on (R^°) y 
is denoted by C. 

For every finite or countably infinite set M, a binary relation —> Q M x M is total if 
for every m e M there is some n e M such that m — > n. A finite path in M = (M, — >) 

1 For a given strategy type T (such as MD or MR), we say that a vertex v has a T value if 
sup^j^ inf„ £r<> E^ - * [Payoff] = inf Te7<> sup^^ E^" [Payoff], where T n and are the classes 
of all T strategies for Player □ and Player O, respectively. 
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is a finite sequence w = mo, . . . , nik such that m, — > m i+ \ for every i, where < i < k. 
The length of w, i.e., the number of transitions performed along w, is denoted by |w|. A 
ran in M is an infinite sequence lo = mo, nt\,... every finite prefix of which is a path. 
We also use a>(i) to denote the element m, of a>, and w, to denote the run m,, m,-+i, . . . 
Given m,n e M, we say that n is reachable from m, written m — »* n, if there is a finite 
path from m to n. The sets of all finite paths and all runs in M are denoted by Fpath(M) 
and Run(M), respectively. For every finite path w, we use Run(M, w) and Fpath(M, w) 
to denote the set of all runs and finite paths, respectively, prefixed by w. If M is clear 
from the context, we write just Run, Run(w), Fpath and Fpathiw) instead of Run(Ai), 
Run(M, w), Fpath(M) and Fpath(M, w), respectively. 

Now we recall basic notions of probability theory. Let A be a finite or countably 
infinite set. A probability distribution on A is a function / : A — > R-° such that 
HaeAf( a ) = 1- A distribution / is rational if f(a) e Q for every a e A, positive if 
f(a) > for every a e A, Dirac if f(a) = 1 for some a e A, and uniform if A is finite 
and f(a) = ^ for every a e A. A cr-field over a set X is a set IF c 2 X that includes X and 
is closed under complement and countable union. A measurable space is a pair (X, T) 
where X is a set called sample space and IF is a cr-field over X. A probability measure 
over a measurable space (X, J^) is a function !P : !F — » R-° such that, for each countable 
collection {X, of pairwise disjoint elements of T, P(\Jiei X t ) = !P(X,), and more- 
over P(X) = 1 . A probability space is a triple (X, IF", f) where (X, IF) is a measurable 
space and !P is a probability measure over (X, T). 

Definition 1. A stochastic game is a tuple G = (V, -> , (V n , V<>, Vo)>^ >ro ^) w/iere V is 
a finite or countably infinite set of vertices, — > c V x V is a total transition relation, 
(V n , Vo, Vq) is a partition ofV, and Prob is a probability assignment which to each 
v E Vq assigns a positive probability distribution on the set of its outgoing transitions. 
We say that G is O-finitely-branching ;/ for each v e there are only finitely many 
u e V such that v — > u. 

Strategies. A stochastic game G is played by two players, □ and O, who select the 
moves in the vertices of V n and V<>, respectively. Let e {□, <>}. A strategy for Player 
in G is a function which to each finite path in G ending a vertex v e V assigns a 
probability distribution on the set of outgoing transitions of v. We say that a strategy t 
is memoryless (M) if r(w) depends just on the last vertex of w, and deterministic (D) 
if it returns a Dirac distribution for every argument. Strategies that are not necessarily 
memoryless are called history-dependent (H), and strategies that are not necessarily 
deterministic are called randomized (R). Thus, we obtain the MD, MR, HD, and HR 
strategy types. The set of all strategies for Player O of type T in a game G is denoted 
by Tq, or just by T Q if G is understood (for example, MR n denotes the set of all MR 
strategies for Player □). 

Every pair of strategies (cr, n) e HR n x HR<> and an initial vertex v determine a 
unique probability space (Run(v), T , where T is the cr-field over Run(v) gen- 

erated by all Run(w) such that w starts with v, and T^,'" is the unique probability 
measure such that for every finite path w = vo, . . . , v* initiated in v we have that 
P'v' Tl \Run(w)) = IJ^Xi, where x; is the probability of v,- — > v,-+i assigned either by 
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<x(vo, . . . , Vj), u(vq, . . ., v,), or Prob(vj), depending on whether v,- belongs to V n , V<>, 
or Vq, respectively (in the case when k - 0, i.e., w - v, we put Py'" (Run(w)) = 1). 

Determinacy, optimal strategies. In this paper, we consider games with the total accu- 
mulated reward objective and reachability objective, where the latter is understood as a 
restricted form of the former (see below). 

Let r : V — > R-° be a reward function, and Acc : Run — > R^° a function which to 
every run to assigns the total accumulated reward Acc(a>) = £;=o r (°X0)- Let T be a 
strategy type. We say that a vertex v eV has a T -value in G if 

sup inf E^ K [Acc] = inf sup E^[Acc] 

ersT D neTv n£T<> <reT n 

where E^'^Acc] denotes the expected value of Acc in (Run(v), T, 'Py'"). If v has a 
T-value, then Val^v, r, G) (or just Valj-(v) if G and r are clear from the context) de- 
notes the T -value ofv defined by this equality. 

Let Q be a class of games. If every vertex of every G e Q has a T-value for every 
reward function, we say that Q is T -determined. Note that Acc is generally not bounded, 
and therefore we cannot directly apply the results of 1231221 to conclude that the class 
of all games is HR-determined. Further, these results do not say anything about deter- 
minacy for the other strategy types even for bounded objective functions. 

If a given vertex v has a T-value, we can define the notion of ^-optimal T strategy 
for both players. 

Definition 2. Let v be a vertex which has a T -value, and let e > 0. We say that 

- cr eT D is e-T-optimal in v ifE^' n [Acc] > Valj-(v) sfor all n e r<>; 

- n E r is e-T-optimal in v ifEy'"[Acc] < Val^y) ffi sfor all cr e T n . 

A 0-7 -optimal strategy is called T-optimal. 

In this paper we also consider reachability objectives, which can be seen as a re- 
stricted form of the total accumulated reward objectives introduced above. A "standard" 
definition of the reachability payoff function looks as follows: We fix a set R c V of 
target vertices, and define a function Reach : Run — > {0,1} which to every run as- 
signs either 1 or depending on whether or not the run visits a target vertex. Note 
that Ey'" [Reach] is the probability of visiting a target vertex in the corresponding play 
of G. Obviously, if we assign reward 1 to the target vertices and to the others, and re- 
place all outgoing transitions of target vertices with a single transition leading to a fresh 
stochastic vertex u with reward and only one transition u — > u, then E^,' n [Reach] in the 
original game is equal to E^tAcc] in the modified game. Further, if the original game 
was O-finitely-branching or finite, then so is the modified game. Therefore, all "posi- 
tive" results about the total accumulated reward objective (e.g., determinacy, existence 
of T-optimal strategies, etc.) achieved in this paper carry over to the reachability ob- 
jective, and all "negative" results about reachability carry over to the total accumulated 
reward. 
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Fig. 1. Player □ has an MR-optimal strategy in v, but no HD-optimal strategy in v. All vertices 
are labelled by pairs of the form vertex name:reward. 

3 Results 

Our main results about the determinacy of general stochastic games with the total ac- 
cumulated reward payoff function are summarized in the following theorem: 

Theorem 3. Let Q be the class of all games. Then 

a) Q is both HR-determined and HD-determined. Further, for every vertex v of every 
G € Q and every reward function r we have that VaiHR(v) = ValHo(v). 

b) Q is neither MD-determined nor MR-determined, and these results hold even for 
reachability objectives. 

An optimal strategy for Player □ does not necessarily exist, even if G is a game with 
a reachability payoff function such that V<> = and every vertex of V D has at most 
two outgoing transitions (see, e.g., B8I20ID . In fact, it suffices to consider the vertex v of 
Fig. [2] where the depicted game is modified by replacing the vertex u with a stochastic 
vertex u', where u' — > u' is the only outgoing transition of u', and u' is the only target 
vertex (note that all vertices in the first two rows become unreachable and can be safely 
deleted). Clearly, ValHR(v) = 1, but Player □ has no optimal strategy. 

Similarly, an optimal strategy for Player O may not exist even if V n = [8 20|. To 
see this, consider the vertex u of Fig.[2l where t is the only target vertex and the depicted 
game is modified by redirecting the only outgoing transition of p back to u (this makes 
all vertices in the last two rows unreachable). We have that ValnR(M) = 0, but Player O 
has no optimal strategy. 

One may be also tempted to think that if Player □ (or Player O) has some optimal 
strategy, then he also has an optimal MD strategy. However, optimal strategies generally 
require infinite memory even for reachability objectives (this holds for both players). 
Since the corresponding counterexamples are not completely trivial, we refer to [20| for 
details. Interestingly, an optimal strategy for Player □ may also require randomization. 
Consider the vertex v of Fig. Q] Let <x* € MR n be a strategy selecting v — > q n with 
probability 1/2". Since V<> = 0, we have that inf^ eH R Ef ,K [Acc] = oo = Valim(v). 
However, for every a e HD D we have that inf^ eH R E^Acc] < oo. 
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For O-finitely-branching games, the situation is somewhat different, as our second 
main theorem reveals. 



Theorem 4. Let Q be the class of all O-finitely-branching games. Then Q is 
HR-determined, HD-determined, MR-determined, and MD-determined, and for every 
vertex v of every G € Q and every reward function r we have that 

Val HR (v) = Val H D(v) = Val MR (v) = Val M D(v) . 

Further, for every G E Q there exists a MD strategy for Player O which is optimal in 
every vertex of G. 

An optimal strategy for Player □ may not exist in O-finitely-branching games, and even 
if it does exist, it may require infinite memory 11201 . 

Theorems[3]and[4]are proven by a sequence of lemmas presented below. For the rest 
of this section, we fix a stochastic game G — (V, — > , (V n , V<>, Vo),Prob) and a reward 
function r : V — > R-°. We start with the first part of Theorem [3] (a), i.e., we show that 
every vertex has a HR- value. This is achieved by defining a suitable Bellman operator L 
and proving that the least fixed-point of L is the tuple of all HR-values. More precisely, 
let L: (R^Y -> (R|°) v , where y = L(x) is defined as follows: 



Jv = 



r(v) + sup v ^ v , x v > if v e V n 

r(v) + infy-jv' Xy> if v e V<> 

r(v) + Ev^v *v • Prob(v)(v, v') if v e V Q . 



A proof of the following lemma can be found in Appendix|A] Some parts of this proof 
are subtle, and we also need to make several observations that are useful for proving the 
other results. 

Lemma 5. The operator L has the least fixed point K (w.r.t. cj and for every v e V we 
have that 

K v = sup inf E™[Acc] = inf sup E^"[Acc] = Val HR (v). 

o-eHR n n-eHRo jreHR^ <reHR n 

Moreover, for every s > there is n e e HD<> such that for every v € V we have that 
sup (reHRn < Val HR (v) © s. 

To complete our proof of Theorem [3] (a), we need to show the existence of a 
HD-value in every vertex, and demonstrate that HR and HD values are equal. Due to 
Lemma [5] for every s > there is n s e HD such that n e is e-HR-optimal in every 
vertex. Hence, it suffices to show the same for Player □. The following lemma is proved 
in Appendix 151 

Lemma 6. For every s > 0, there is o~ e € HD n such that cr e is s-HR-optimal in every 
vertex. 

The next lemma proves Item (b) of Theorem[3] 
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Lemma 7. Consider the vertex v of the game shown in Fig.\2\ where t is the only target 
vertex and all probability distributions assigned to stochastic states are uniform. Then 

(a) sup (rEMDn inf^MD^ E^' n [Reach] = sup^^ M nemLs> E^* [Reach] = 0; 

(b) inf^ E MD sup [reMDn Ev'* [Reach] = inf^ eM R sup^,^ E^'* [Reach] = 1. 

Proof. We start by proving item (a) for MD strategies. Let cr* e MD n . We show that 
infffeMDo E v '"[Reach] = 0. Let us fix an arbitrarily small s > 0. We show that there 
is a suitable n* e MD<> such that EJT [Reach] < s. If the probability of reaching 
the vertex u from v under the strategy cr* is at most s, we are done. Otherwise, let p s 
be the probability of visiting the vertex s from v under the strategy cr without passing 
through the vertex u. Note that p s > and p s does not depend on the strategy chosen by 
Player O. The strategy n* selects a suitable successor of u such that the probability p, 
of visiting the vertex t from u without passing through the vertex v satisfies p t /p s < s 
(note that p, can be arbitrarily small but positive). Then 

Ef-"' [Reach] < Y(\-ptfp t = {l ~ Ps)P < < e 

For MR strategies, the argument is the same. 

Item (b) is proven similarly. We show that for all n* e MD<> and < s < 1 there 
exists a suitable cr* e MD n such that [Reach] > 1 — s. Let p t be the probability of 
visiting t from u without passing through the vertex v under the strategy n*. We choose 
the strategy cr* so that the probability p s of visiting the vertex s from v without passing 
through the vertex u satisfies p s /p, < s. Note almost all runs initiated in v eventually 
visit either s or t under {cr*, n*). Since the probability of visiting s is bounded by s (the 
computation is similar to the one of item (a)), we obtain EJT "* [Reach] > 1 — e. For MR 
strategies, the proof is almost the same. □ 

We continue by proving Theorem [4] This theorem follows immediately from 
Lemma[5]and the following proposition: 

Proposition8. IfG is -finitely '-branching, then 

1. for all v € V and s > 0, there is cr B e MD n such that o~ E is e-HR- optimal in v; 

2. there is tt e MD<> such that tt is HR-optimal in every vertex. 

As an immediate corollary to Proposition[8] we obtain the following result: 

Corollary 9. IfG is -finitely '-branching, V n is finite, and every vertex ofV n has finitely 
many successors, then there is cr e MD n such that cr is HR-optimal in every vertex. 

Proof. Due to Proposition^ for every vertex v and every s > 0, there is cr e e MD n such 
that cr e is e-HR-optimal in v. Since V a is finite and every vertex of V n has only finitely 
many successors, there are only finitely many MD-strategies for Player □. Hence, 
there is a MD strategy cr that is e-HR-optimal in v for infinitely many s from the set 
{1, 1/2, 1/4, ... }. Such a strategy is clearly HR-optimal in v. Note that cr is HR-optimal 
in every vertex which can be reached from v under cr and some strategy n for Player O. 
For the remaining vertices, we can repeat the argument, and thus eventually produce a 
MD strategy that is HR-optimal in every vertex. □ 
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Fig. 2. A game whose vertex v has neither MD-value nor MR- value. 



Hence, if all non-stochastic vertices have finitely many successors and V D is finite, 
then both players have HR-optimal MD strategies. This can be seen as a (tight) gener- 
alization of the corresponding result for finite-state games |[T9l . 

The rest of this section is devoted to a proof of Proposition [8] We start with ItemQ] 
The strategy cr £ is constructed by employing discounting. Assume, w.l.o.g., that rewards 
are bounded by 1 (if they are not, we may split every state v with a reward r(v) into a 
sequence of \r{v)~\ states, each with the reward r(v)/|>(v)l). Given A e (0, 1), define 
Acc A : Run — > R-° to be a function which to every run u assigns Acc A {a>) = ££o^' • 
rMO). 

Lemma 10. For A sufficiently close to one we have that 

sup inf E^(Acc A ) > Val HR (v)e^ 

Proof. We show that for every e > there is n > such that the expected reward that 
Player □ may accumulate up to n steps is e-close to ValeR(v) no matter what Player O 
is doing. Formally, define Acq : Run — > R-° to be a function which to every run a> 
assigns Acc,t(w) = Yll=Q r (w(/))- The following lemma is proved in AppendixICl 

Lemma 11. IfG is O-finitely-branching, then for every v e V there is n G N such that 
sup inf E^ n (Acc„) > Val HR (v) e - 

o-eHR n ttgHRo 4 

Clearly, if A is close to one, then for every run a> we have that 



Acc' (to) > Acc n ((jj) - — 
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Thus, 

sup inf E^ K (Acc A ) > sup inf E^ K (Acc„) - - > Val HR (v) e ^ 

o-eHR n jreHR« o-eHR n ^eHR<> 4 1 

This proves LemmafTOl 

So, it suffices to find a MD strategy cr e satisfying 

inf E^"(Acc A ) > sup inf E^"(Acc A ) - | . 

teHR« o-eHR n n-eHR^ 2 

We define such a strategy as follows. Let us fix some i eN satisfying 

■ max ryv) < — . 



1-/1 veV 

Intuitively, the discounted reward accumulated after t steps can be at most |. In a given 
vertex v € V a , the strategy <x e chooses a fixed successor vertex u satisfying 

sup inf E^"{Acc A ) > sup sup inf E^f (Acc A ) - — 

o-eHR n JreHR v-nS o-eHR n "£HR { ■ 4 

Now we show that 

inf E^ K (Acc A ) > sup inf E^"(Acc A ) - - . 

teHR« o-eHR n ;reHR 2 

which finishes the proof of Item[T]of Proposition [8] 

For every k e N we denote by cr^ a strategy for Player □ defined as follows: For 
the first k steps the strategy makes the same choices as cr e , i.e., chooses, in each state 
v e V n , a next state u satisfying 

sup inf E^(Acc A ) > sup sup inf E^"{Acc A ) - 

o-eHR n n-eHR^ v -> u ' o-eHR n *"£HR^ k ■ 4 

From £+l-st step on, say in a state u, the strategy follows some strategy f satisfying 
inf Ei'"(Acc A ) > sup inf E^ n {Acc A ) - - 

7reHR o-eHR n TreHR* 8 

A simple induction reveals that 07. satisfies 

3e 

inf E^'"(Acc A ) > sup inf E^"(Acc A ) - — (2) 

;reHR o-eHR n 7reHR « 8 

(Intuitively, the error of each of the first k steps is at most ■Sr and thus the total error of 
the first k steps is at most k ■ ^ = |. The rest has the error at most | and thus the total 
error is at most 

We consider k — I (recall that • max ve y r(v) < |). Then 

inf E^'"(Acc A ) > inf E^ K (Acc A ) - § > sup inf E^"{Acc A ) - | 

;reHR s-eHR 8 o-eHR feHR 2 
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Here the first equality follows from the fact that cr k behaves similarly to <x E on the first 
k — I steps and the discounted reward accumulated after k steps is at most | . The second 
inequality follows from Equation ©. 

It remains to prove Item [2] of Proposition [8] The MD strategy n can be easily con- 
structed as follows: In every state v e V , the strategy n chooses a successor u minimiz- 
ing ValnR(M) among all successors of v. We show in Appendix iDl that this is indeed an 
optimal strategy. 

4 Conclusions 

We have considered infinite-state stochastic games with the total accumulated reward 
objective, and clarified the determinacy questions for the HR, HD, MR, and MD strat- 
egy types. Our results are almost complete. One natural question which remains open 
is whether Player □ needs memory to play e-HR-optimally in general games (it follows 
from the previous works, e.g., 181201 . that e-HR-optimal strategies for Player require 
infinite memory in general). 
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Technical Appendix 



A Proof of Lemma \5\ 

LemmaHJ The operator L has the least fixed point K (w.r.t. EJ and for every v e V we 
have that 

K v = sup inf E^"[Acc] = inf sup E™[Acc] = Val HR (v). 

o-eHR n JreHR^ ;reHR o-eHR n 

Moreover, for every s > there is n e € HD<> such that for every v € V we have that 
su P[T£HRn < ValHR(v) © s. 

The partially ordered set ((R|, ) y , C), where E is a standard componentwise order- 
ing, is a complete lattice. Moreover, from the definition of L we can easily see that L is 
monotonic, i.e. L(x) C L(x') whenever x E x'. Thus, by the Knaster-Tarski theorem the 
operator L has the least fixed point, which we denote by K. 

In order to prove that K v = ValHR(v) for every v e V, it suffices to prove the follow- 
ing: 

Vv € V : K v < sup inf B™(Acc) < inf sup E^(Acc) < K v . (3) 

o-eHR n ireHR^ 7reHR o-eHR n 

The second inequality holds trivially, so it suffices to prove the remaining ones. 

To prove the first inequality, it suffices to show that the vector 5 € (R|f) y defined by 
5,, = sup (reHRn inf^ eH R E^'"(Acc) is a fixed point of L. Since K is the least fixed point of 
L, the inequality then follows. So let v e V be arbitrary. We will show that L{S) V = S v . 

If v e V Q , then we have to show that 

L(S) V = r(u) + sup sup inf B*?{Acc) = sup inf E™(Acc) = S v . 

i.-»v' o-eHRn 7reHR o-eHR n neHR^ 

Assume, for the sake of contradiction, that the equality does not hold, i.e. that either 
L(S) V < S v or L(S) V > S v . If L{S) V > S v , then there is a transition v — > v' and a strategy 
cr' e HR D such that r(u) + mf KellR ^ E^ K (Acc) > sup (reHRn inf ff£HRo E^ K (Acc). If we 
denote by cr" the strategy that moves from the initial vertex v to v' with probability 1 
and then starts to behave exactly like the strategy cr', then we obtain 

inf Ef-*(Acc) = r(u)+ inf E°/*(Acc) > sup inf E™(Acc) > inf Ef- n (Acc), 

7reHR ;reHR o-eHR n ;reHR neHR 

a contradiction. So assume that L(S) V < S v . Then there is some 6 > and some function 
/ : HR D x V — » HR such that for every transition v — > v' and every cr e HR n we have 
r(u) + E°/^ cr ' v ' < S v ©(5. For any strategy cr we denote by p£ the probability the strategy 
cr assigns to transition v — > v' in a game starting in v. Then we can write 

sup inf B™(Acc) = r(u) + sup inf Y p£ ■ E^;"(Acc) 

o-eHRn ?reHR creHR n 7reHR ~. 

< r(u) + sup Y / a ■ E^; f{ay \Acc) <S v e6 

o-eHR n , 

< S v = sup inf E^CAcc), 

o-eHRn ' reHR « 
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again a contradiction. 

For v e V<> the proof is dual to the proof for v e V n , so we omit it. Finally, for 
v e Vq we have 

L(S V ) = r(u) + y Prob(v)(v, V) • sup inf E^(Acc) 
K«) + y ProZ>(v)(v, v') • B^f(Acc) 



sup inf 

creHR n 7reHR 



sup inf B^{Acc) = S v . 

o-eHRn 7reHR« 



This concludes the proof that 5 is a fixed point of L and thus also the proof of the first 
inequality in J3J. 

It remains to prove the third inequality in (01. To this end we prove that for every 
e > there is a strategy n e e HD<> such that for every v e V we have sup [T£HRn < 
K v + s. Note that this will also prove the second part of the lemma. 

If K v - oo, then the desired inequality holds trivially for any strategy of player <> 
(and particularly for every n e HD<>). So assume that K v is finite and fix arbitrary s > 0. 
We define the strategy n E as follows: let wu be any finite path with u E V<>. Since AT is a 
fixed point of L, there must be a successor u' of u such that r(u) + K u > < K u + e/2' WM ' +1 . 
We set n e {w) to be a Dirac distribution that selects the transition u — > u' with probability 
1. 

We will now prove the following lemma, that not only shows that the strategy n e 
has the desired property, but it will also be useful later. 

Lemma 12. Let s > be arbitrary and let n e be any deterministic strategy of player 
<0> that has the following property: for every finite path wu starting in v and ending in 
u 6 V<>, the transition u—>u' selected by n E {wu) satisfies r(u) + K u ' < K u + e/2' w "' +1 . 
Then sup^^ E^iAcc) < K v + s. 

Proof. We will prove that for every v, every n e No and every strategy <x of player □ 
we have Ej r ' 7r ' ! (2" =0 «(/)) < K v + s. By the monotone convergence theorem this means 
that By*" (Ace) < K v + e for every cr, and thus also sup (rEHR B^ s (Acc) < K v + e. 

So let us fix arbitrary v, n and cr. Recall that E^""[X"|F| denotes the conditional 
expectation of random variable X given the event Y. We show that for every < k < n 
and every finite path w — Vq, . . . , Vjt we have 



r(oj{i)) | Run(w)] < K Vk + ^ s/2 k 



i=k i=k 

In particular, this means that E^KZto w (0) = K^'ll/U r ( w (0) I Run(v)] < K v + s. 
We proceed by downward induction on k.\fn- k, then we trivially have 



E^[JV("(0) I Rum = r(v k ) < L(K) Vk = K Vk , 



i=k 

where the inequality follows from the definition of L. 
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Now suppose that k < n. We distinguish two cases. If v% s V<>, denote by u the 
successor of chosen by n E . Then we have 

n n 

r ( w ®) I R«n(w)] = r(v^) + E^t ^ r(w(;)) I Ki«(wh)] 

Nfc i=k+\ 

n 

<r{v k ) + K u + J] e/2' +1 

i=k+l 

I! 

<K Vk + J] S /2 i+ \ 

i=k 

where the inequality on the second line follows from induction hypothesis and the in- 
equality on the third line follows from the definition of n s . 

If v k e V n U V Q , then we can see that E^" s [^'U r(co(i)) I Run(w)] = 
Tiv k ^>uPu • B^'Efet+i K w (0) I /?Mn(ww)] for some sequence of real numbers (p u ) Vk ^, u 
s.t. /?„ > for every u and Tiv k -*uPu - 1- By induction hypothesis we have 
B^IXtfc+i I Rmb(wm)] < + Ztk+i fi /2' +1 for every v*^m. Finally, from 

the definition of L we obtain K Vll = L{K) Vk > 2vt->H Z 7 " ' ^» ( tne inequality can be strict 
only if v e V n ). Together, we have 

n n n 

E^[^ r(w(0) I < + J] s/2 M < K Vl + £ e/2 i+1 . 

i=k i=k+l i=k 

□ 

This finishes the proof of Lemma[5] 



B Proof of Lemma |6] 

Lemma |6j For every e > 0, there is cr e e HD n smc/; f/iaf <x e is s-HR-optimal in every 
vertex. 

Let e > be arbitrary. It suffices to fix an arbitrary initial vertex v, define choices 
of the strategy cr e only on the finite paths starting in v and verify, that the resulting 
strategy is e-HR-optimal in v. By repeating this construction for every v € V we obtain 
a strategy that is e-HR-optimal in every vertex. 

For the sake of better readability, we first present the detailed construction of the 
deterministic e-HR-optimal strategy <x E for games in which the HR-value is finite in 
every vertex. Almost identical construction can be used for games with arbitrary HR- 
values; there are some subtle technical differences that will be presented in the second 
part of the proof. 

We already know that the least fixed point K of the operator L is equal to the vector 
of HR-values. Moreover, from the standard results of the fixed-point theory (see, e.g., 
Theorem 5.1 in 0~3|) we know that K = L"(0) for some ordinal number a (where 
is the vector of zeros and where the transfinite iteration of L is defined in a standard 
way, i.e. we put lfi(0) - sup y</3 L y (0) for every limit ordinal f3). The following lemma 
is instrumental in the construction of cr e . 
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Lemma 13. Let s > be arbitrary. Denote by a the ordinal number a such that 
L a (0) v = Val H R(v) and denote by Ord a the set of all ordinal numbers lesser than or 
equal to a. Then there is a labeling function d : Fpath(y) — > Ord a satisfying the follow- 
ing conditions: 

(a) d(v) = a. 

(b) For every wu e FpatHy) it holds either d(w) = or d(wu) < d(w). 

(c) For every wu e Fpathiy), we have 



L d(wu) (0) u - 



2lw«l+i 



r(u) + L d(wuu '\Q) u ,, for some u^u' if u 6 V a 

r(u) + irrf^, L d ( wuu '\0) u , ifu e V 

r(u) + Zu^w Prob(u)(u, u') ■ L<»(OV if u e V Q . 



Proof. We define the labeling d inductively, proceeding from the shorter paths to the 
longer ones. Obviously we set d{v) = a. Now suppose that d(wu) has already been 
defined. We will define d(wuu') for all successors u' of u simultaneously. First let us 
assume that d(wu) is a successor ordinal of the form [3 + 1. Then it suffices to put 
d(wuu') = /3 for all successors u' of u. From the definition of L we can easily see that 
for every 5 > it then holds 



^ +1 (0) M -c5< 



r(u) + L^(0)„', for some u — > u' if u € V n 

r(u) + inf u ^ u , lf(0) u > if u € V<> 

r(u) + Y,u^w Prob(u)(u, u') ■ LP(0) u < if u e V Q , 



so in particular the inequality in (c) holds for wu. 

Now let us assume that d(wu) is a limit ordinal. Then L d( - WU \Q) U = sup r<J(ll , M) L y (0) u . 
This means that there is y < d(wu) such that L d(wu \0) u - e/2 t,w<l+2 < Z/(0)„. Clearly, we 
can assume that y = /3 + 1 fore some ordinal /3. Now we again set d(wuu') = f3 for all 
successors u' of u. Using the argument from the previous paragraph with 6 = e/2 |vv " l+2 
we obtain 



L d(wu) (0) u - 



2|WK|+1 



< i r (0)„- 



r(u) + LP(0) u *, for some u — > u' if u e V D 

r(u) + inf„^„, Z/^OV if m e V 

Km) + Z„^< Pwb(u)(u, u') ■ L^OV if « e V , 



so (c) again holds for w«. 

Finally, if d(wu) = 0, then we set d(wuu') = for all successors u' of m. In this way, 
we eventually define d(w) for every finite path starting in v. It is obvious that d satisfies 
(a)-(c). ' ' □ 



We use the labeling d provided by the previous lemma to define the e-HR-optimal 
HD strategy cr E of player □. For a given finite path wu the strategy cr e selects a transition 
u — > u r such that L d(wu \0) u - e/2 |w " l+1 < r(u) + L d(wuu '\Q) u , . Such a transition always 
exists due to the previous lemma. We now prove that the strategy cr e is e-HD-optimal 
in v. We will actually prove a more general statement, that we will reuse later. 
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Lemma 14. For every run a> denote by t(lo) the least k such that d(a>(0), . . . , u>(k)) — 
and denote by S T k the random variable defined by S T k (a>) — 2^ r(a>(i)). Then the 
following holds for every wu e Fpathiy): 

A K ** [S ^ 1 Run(wu)] - Ld(wu)m " ~ 2^- (4) 

In particular, we have 

inf W,^{Acc) > inf E^[Sl \ Run(v)] > L ff (0)„ - s = Val HR (v) - s. 

7T£HR JT£HR 

Proof. We proceed by transfinite induction on d(wu). If d(wu) = 0, then the inequality 
© clearly holds. Now suppose that d(wu) > and that the inequality © holds for 
every f3 < d(wu). We distinguish three cases depending on the type of u. 

(1.) u e V p . Denote by u' the successor of u selected by cr e (wu). Then we have 
inf E^[SJ wul \Run(wu)] = r(u)+ inf E™[SJ wml] \ Runiwuu')] 

> r(u) + L d(W) (0) a 

v ' • 2l M '"l +I 

where the second line follows from the induction hypothesis and from the fact that 
d(wuu') < d(wu), and the third line follows from the definition of o~ e . 
(2.) u e V . Then we have 

inf E^rSLi \Run(wu)] = r(u)+ inf inf E^' n [SL „„„ | Run(wuu')] 
> r(u) + inf L^'^O)^ - * 



wu\+\ 



2 ^ w " 2l"'«l ' 

where the first line is easy, the second line again follows from the induction hy- 
pothesis and the third line follows from Lemma[T3l 
(3.) // € Vq. We denote by u A u' the fact that Prob(u)(u, u') = x. We have 



inf E^[SJ WU] \Run(wu)] = r(u) + V x- inf E^Sf^,, | /?w/i(whw')] 

it— >u' 

> r(u) + [Yj X ' L d( - wuu '\0)w) 

> L d(wu) (0) u - 



2|m'»I+i 

e 



2lw«| ' 

where again the second and the third line follows from induction hypothesis and 
Lemma[T3l respectively. 

□ 
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It remains to show how to handle the case when there are vertices with infinite 
HR-values. The idea is the same, but the proof is more technical. We need to slightly 
generalize the previous two lemmas. The following lemma generalizes Lemma[T3l We 
denote by last(w) the last vertex on a nonempty path w. 

Lemma 15. Under the assumptions of Lemma [75] there exists a labeling function 
d: Fpath(v) — > Ord a satisfying the following conditions: 

(a) d(v) — a. 

(b) For every wu e Fpath(v) it holds either d(w) — or d(wu) < d(w). 

(c) For every wu e Fpathiv), such that L d( - W "\0) u < oo, we have 



'r(u) + L d(wuu ">(0) u ,, for some u^u' if u e V n 

r(u) + inf^ B , L<«(0V if u e V<> 

r(u) + Pwb(u)(u, u') ■ L d(wuu '\0) u , ifu e V , 

and for every wu € Fpathiv), such that L d(mt \0) u — oo, we have 



(V >" 2 I"'"I +1 ^ 



- +s-(\wu\ + 1) + F(w) < 
s 



r(u) + L d(wuu '\0) u , , for some u^u' ifueV D 
K«) + inf^ Z/»(0V ifueV 
r(u) + Zu^w Prob(u)(u, u') ■ L d(mm '\{S) u . if u e V Q , 



where F(w) 



|i d(w) (0)iast(w) ifw is nonempty and L diw) (0)i ast ( w ) < 00 
otherwise. 



Proof. We again define the function d inductively, starting by putting d(v) — a. Now 
let wu be an arbitrary finite path such that L d ^ w "\0)„ = oo. If d(wu) = f3 + 1 for some 
ordinal (3, then we can put d(wuu') = /3 for all successors u' of u. From the definition of 
L it then easily follows that the inequality in (c) holds for wu. (For example, if u e V n , 
then we have oo = r(u) + sup u ^ u , LP(0) u > and there is surely u — > u' s.t. r(u) + i^(0)„' > 
1/e + s ■ (|wm| + 1) + F(w). It is of course possible that LP(0) u > - oo.) 

If d(wu) is an limit ordinal, then there is a successor ordinal /3 + 1 < d(wu) s.t. 
L^ +1 (0) u > 2/e + e • (|w«| + 1) + F(w). We set d(wuu') - f3 for all successors u' of 
m. If L^ +1 (0)„ = oo, then from the previous paragraph we get that (c) holds for wu. If 
L^ +I (0)„ < oo, then the same argument as in the proof of Lemma [T3l shows, that for 
every 6 > the right-hand side of the inequality in (c) is 5-close to L' 3+1 (0)o. If we set 
6 = 1/e, we get that (c) holds for wu. 

For wu with L d< - Wu \0) u < oo we can use the same construction as in the Lemma [T3l 

□ 

For every wu let us set 

A ™ = U d(wu \0) u - if L d(ym) (0)u < oo 

1 - + s • (\wu\ + 1) + F(w) otherwise, 



and 



[ l *™) (0)m _ _|_ ifL^)(0) B <oo 
I - + e ■ \wu\ + F(w) otherwise. 
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Note that A™ -6> B™ for every < 5 < s/2 |wm|+1 . We now define the e-HR-optimal 
deterministic strategy cr e as follows: for a given wu e Fpath(v), the ct(wm) selects a 
transition u—>u' such that A£" < r( M ) + L d(ymu \0) u > . It remains to prove that <x £ is 
e-HR-optimal in v. We generalize LemmafPflas follows: 

Lemma 16. The following holds for every wu € Fpathiy): 

inf B^SL,,, | Rwn(ww)] > BT. (5) 

Proof. The proof again proceeds by transfinite induction on d(wu). The base case is 
the same as in Lemma fT4l because if d(wu) = 0, then B'"' = -^tt- So assume that 
d(wu) > and that © hols for all a < d{wu). If L d(ym \0) u < oo, then we can basically 
proceed in exactly the same way as in the Lemma [14] The only difference here is the 
case when u e V<>, L d{ml) (0) u < oo and L d{wuu '\Q) u , = oo for some u -* u'. But in this 
case we have K"*^^ I Run(wuu')] > B W E UU ' > l/s + F(wu) = l/s + L d(m,> (0) u > 
l/s + inf,,^,,/ L d( - wuu '(0),,', so the computation in part (2.) of the proof of Lemma [T4l is 
still valid. 

If L d ^ wu \0) u = oo, then we consider the following cases: 
(1.) u € V n . Denote by u' the successor of u selected by cr e (wu). Then 

inf E^[SJ wul \Run(wu)] = r(u)+ inf E™[SJ wml] \Run(wuu')] 

> r(u) + B w e m '\ 

where the second line comes from the induction hypothesis. There are two possi- 
bilities. Either 

B™"' = 1 Is + s ■ \wu\ + s + F(w) > 1 Is + s ■ \wu\ + F(w) = B™, (6) 

or 

Km) + = K«) + L d(wuu '\0) u r^-r > AJ" - -r^rr £ K\ (7) 

v y v y " 2l M '"l +1 2l M '"l +1 

where the second inequality follows from Lemma[T5land from the definition of cr E . 
In both cases the equation © holds. 
(2.) u € V<>. Then we have 

inf E^[5[ WM| | Run(wu)] = r(") + inf inf E^S^,, | &n(w««')] 



> 

H— >« 



inf (r(«) + B w e uu ') . 



Exactly the same computation as in the case (1.) reveals that © or © holds for 
all u — > m', and thus for all these transitions we have r(u) + B™ u > B H e ". Thus, 
inf„^„, (r(u) + B™ m '\ > B n e u and © holds for wu. 
(3.) u e Vq. Then again from the induction hypothesis it follows that 



inf E^[SJ WU] | *an(w«)] = r(u) + V * . ( inf Ef-TSj^,, | KiitfW)]] 

u — *u' 
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where the last inequality can be justified in exactly the same way as in the previous 
two cases. 

□ 

C Proof of Lemma HT1 

Lemma UT1 If G is O-finitely-branching, then for every v e V there is n € N such that 
sup inf E^(Acc n ) > Val HR (v)ej (8) 

Let v e V be arbitrary. Without loss of generality, we can assume that v e Vq and 
that v has only one outgoing transition. If this is not the case, we can simply add a new 
stochastic vertex v' with a zero reward and a single new transition v — > v'. It is clear, that 
if the statement of the lemma holds for v' in this new game, then it holds for v in the 
original game. 

Observe that if every vertex of player O has only finitely many successors, then the 
operator L is Scott-continuous. 

Lemma 17. Let D c (R^°) v be an arbitrary directed set (i.e. such a set that each pair 
of elements in D has an upper bound in D.) Then L(sup rfeZ3 d) — sup rfED L(d). 

Proof. The inequality > follows immediately from the monotonicity of L. So it suf- 
fices to prove that for every directed set D and every vertex v we have L(sup rfeZ3 d) v < 
su PdeD L(d) v . Note that (sup deZ3 d) v = sup deZ3 d v . We consider three cases: 

(1.) v £ V Q . Then we trivially have 

L(suprf),, = sup suprf,/ = sup suprf,,' = supL(rf),.. 

deD v-tv 1 deD deD v->i>' deD 

(2.) v e V<>. Assume, for the sake of contradiction, that inf,,^ v , sup rfeD d v > > 
su PdeD m f i '-»v dv> ■ Then for each of the finitely many transitions v — » V there is a 
vector rf(v') € D such that d(v') v > > sup deZ3 inf,,^,/ d v >. But since the set D is di- 
rected and there are only finitely many v — > v', there is a vector d* e D such that 
d(v') C rf* for every successor V of v. We thus have 

sup inf d V ' > inf d* v , > inf d(y')v> > inf sup inf d v > = sup inf d v /, 

deD v^v' f-»V v^,.' deD ,/<=£) v ^ v ' 

a contradiction. (Above, the second inequality follows from the fact that d(v') E 
d* for every V and the first inequality and the last equality are trivial. The third 
inequality is strict because there are only finitely many successors of v.) 
(3.) v e Vq. Then we again trivially have 

L(suprf),, = y Prob(v)(v,v') ■ suprf,,- = sup > Prob(v)(v,v') ■ d v > = sup L(d) v . 

deD , deD deD , deD 

□ 
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From the Kleene fixed-point theorem it follows that L"(0) - K, i.e. that the ordinal 
number a from Lemmas[13]and[T5]can be assumed to be equal to a>. Fix a labeling d of 
finite paths starting in v that satisfies the conditions (a)-(c) in Lemma[l3](or Lemma[T5l 
if there are some vertices with infinite HR-value). Then v is labeled by a> and all other 
elements of Fpath{v) are labeled with nonnegative integers. Recall that t{oj) denotes 
the least k such that d(a)(0), a>(k)) = 0. 

Now let u be the unique successor of v. We set n = d{vu) + 1 . To see that this n 
satisfies dHJ, consider the deterministic (e/8)-HR-optimal strategy cr e /$ constructed in 
the proof of Lemma [6] From Lemma [13] (or Lemma \15[ it follows that 

inf E^t Y r(w(0) I Run(v)] > Val HR e §. 

But now we clearly have t{u>) < n = d(yu) + 1 for all runs a> starting in v. Thus, we 
have 

inf E^' SlK (Acc n ) > inf E^ /8,lr [ V r(co(i)) \ Run(v)] > Val HR 9 | > Val HR 9 - A . 

^eHR ^GHR £— i 8 4 

i=0 

This finishes the proof of LemmafTTI 

D MD-optimal strategies for player O 

We prove Item|2]of Proposition[8] i.e. the fact that for every O-finitely-branching game 
G there is n € MD<> such that n is HR-optimal in every vertex. We have already defined 
n as follows: In every state v 6 V<>, the strategy n chooses a successor u minimizing 
ValHR(M) among all successors of v. But the HR-optimality of this strategy immediately 
follows from Lemma Q2] (note that this lemma works for s — 0) and Lemma [5] (which 
says that the least fixed-point K of L is equal to the vector of HR-values). 
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